Pearson correlation test (Pearson’s r)#
Goals#
Understand what Pearson’s correlation measures (and what it does not measure).
Run the hypothesis test for correlation and interpret the result.
Implement Pearson’s r and a permutation-based p-value using NumPy only.
Build intuition with Plotly visualizations (scatter, null distribution, sampling variability).
1) What is it used for?#
Pearson’s correlation coefficient r answers a very specific question:
How strong is the linear relationship between two numeric variables?
Typical uses:
Quick EDA: is there a linear trend between
xandy?Feature screening: does a feature move linearly with the target?
Model diagnostics: do residuals correlate with a predictor?
Common misuses / misconceptions:
r = 0does not mean “no relationship” (it means no linear relationship).Correlation is not causation.
Pearson r can be very sensitive to outliers.
2) The hypothesis test (what are we testing?)#
Let (\rho) be the population correlation between random variables (X) and (Y).
A standard Pearson correlation test (most common form):
Null: (H_0: \rho = 0) (no linear association)
Alternative:
two-sided: (H_1: \rho \neq 0)
one-sided: (H_1: \rho > 0) or (H_1: \rho < 0)
What does rejecting (H_0) mean?#
It means your sample provides evidence of a non-zero linear association.
It does not automatically mean:
the relationship is strong (effect size can be tiny but “significant” with large
n)the relationship is causal
the relationship is linear everywhere or stable over time
3) Assumptions (and why a permutation test is useful)#
The classic parametric Pearson test relies on a model where ((X, Y)) is approximately bivariate normal (or at least well-behaved) so that a t-statistic has a known reference distribution.
A permutation test provides an alternative that is often easier to reason about:
Under the null of no association, the pairing between
xandyis arbitrary.If we shuffle
ymany times and recomputer, we get a null distribution.The p-value is the fraction of shuffles that produce a correlation as extreme as the observed one.
This permutation approach is:
NumPy-only and conceptually direct
valid under exchangeability (roughly: the pairs are i.i.d. and shuffling breaks any real association)
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(42)
# Optional: SciPy for the classic parametric p-value (not required for the NumPy-only permutation test)
try:
from scipy import stats
except Exception:
stats = None
def pearson_r(x, y):
x = np.asarray(x, dtype=float).ravel()
y = np.asarray(y, dtype=float).ravel()
if x.shape != y.shape:
raise ValueError(f"x and y must have the same shape; got {x.shape} vs {y.shape}")
if x.size < 2:
raise ValueError("Need at least 2 paired observations")
x_c = x - x.mean()
y_c = y - y.mean()
ssx = float(np.dot(x_c, x_c))
ssy = float(np.dot(y_c, y_c))
if ssx == 0.0 or ssy == 0.0:
raise ValueError("Correlation is undefined when one input is constant")
return float(np.dot(x_c, y_c) / np.sqrt(ssx * ssy))
def ols_line(x, y):
x = np.asarray(x, dtype=float).ravel()
y = np.asarray(y, dtype=float).ravel()
x_mean = x.mean()
y_mean = y.mean()
x_c = x - x_mean
denom = float(np.dot(x_c, x_c))
if denom == 0.0:
raise ValueError("Cannot fit a line when x is constant")
b = float(np.dot(x_c, y - y_mean) / denom)
a = float(y_mean - b * x_mean)
return a, b
def pearson_t_stat(r, n):
if n < 3:
raise ValueError("Need n >= 3 for the t statistic")
r = float(r)
if abs(r) >= 1.0:
return np.inf * np.sign(r)
return r * np.sqrt((n - 2) / (1 - r**2))
def pearson_p_value_parametric(r, n, alternative="two-sided"):
if stats is None:
raise RuntimeError("SciPy is not available; use the permutation test below instead")
t = pearson_t_stat(r, n)
df = n - 2
if alternative == "two-sided":
return float(2 * stats.t.sf(abs(t), df))
if alternative == "greater":
return float(stats.t.sf(t, df))
if alternative == "less":
return float(stats.t.cdf(t, df))
raise ValueError("alternative must be 'two-sided', 'greater', or 'less'")
def pearson_p_value_permutation(x, y, n_perm=10_000, alternative="two-sided", rng=None):
if rng is None:
rng = np.random.default_rng()
x = np.asarray(x, dtype=float).ravel()
y = np.asarray(y, dtype=float).ravel()
if x.shape != y.shape:
raise ValueError("x and y must have the same shape")
r_obs = pearson_r(x, y)
r_perm = np.empty(int(n_perm), dtype=float)
for i in range(r_perm.size):
r_perm[i] = pearson_r(x, rng.permutation(y))
if alternative == "two-sided":
extreme = np.abs(r_perm) >= abs(r_obs)
elif alternative == "greater":
extreme = r_perm >= r_obs
elif alternative == "less":
extreme = r_perm <= r_obs
else:
raise ValueError("alternative must be 'two-sided', 'greater', or 'less'")
p = float((extreme.sum() + 1) / (r_perm.size + 1))
return r_obs, p, r_perm
def bootstrap_ci_r(x, y, n_boot=5_000, ci=0.95, rng=None):
if rng is None:
rng = np.random.default_rng()
x = np.asarray(x, dtype=float).ravel()
y = np.asarray(y, dtype=float).ravel()
if x.shape != y.shape:
raise ValueError("x and y must have the same shape")
n = x.size
if n < 2:
raise ValueError("Need at least 2 paired observations")
idx = rng.integers(0, n, size=(int(n_boot), n))
xb = x[idx]
yb = y[idx]
xb_c = xb - xb.mean(axis=1, keepdims=True)
yb_c = yb - yb.mean(axis=1, keepdims=True)
num = np.sum(xb_c * yb_c, axis=1)
den = np.sqrt(np.sum(xb_c**2, axis=1) * np.sum(yb_c**2, axis=1))
r_boot = num / den
alpha = (1 - ci) / 2
lo, hi = np.quantile(r_boot, [alpha, 1 - alpha])
return float(lo), float(hi), r_boot
4) Example: compute r and visualize the linear relationship#
We’ll generate a simple synthetic dataset with a positive linear relationship plus noise.
n = 80
x = rng.normal(0, 1, size=n)
y = 0.9 * x + rng.normal(0, 1.2, size=n)
r_obs = pearson_r(x, y)
a, b = ols_line(x, y)
x_grid = np.linspace(x.min(), x.max(), 200)
y_hat = a + b * x_grid
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=y, mode="markers", name="data", marker=dict(size=9, opacity=0.8)))
fig.add_trace(go.Scatter(x=x_grid, y=y_hat, mode="lines", name="OLS fit", line=dict(width=3)))
fig.update_layout(
title=f"Scatter plot with fitted line (Pearson r = {r_obs:.3f})",
xaxis_title="x",
yaxis_title="y",
)
fig.show()
Interpreting r#
Sign:
r > 0meansytends to increase whenxincreases;r < 0means the opposite.Magnitude:
|r|closer to 1 means points lie closer to a straight line.
A useful identity:
[ r = \frac{\operatorname{cov}(x,y)}{\operatorname{sd}(x),\operatorname{sd}(y)} ]
So Pearson r is a standardized covariance:
invariant to shifting and scaling of either variable
sensitive to outliers (because covariance is)
5) Running the hypothesis test#
We’ll compute a p-value in two ways:
Permutation test (NumPy only) — our low-level, assumption-light implementation.
Parametric t-test (optional, via SciPy) — the classical Pearson correlation test.
Classic parametric Pearson test (t distribution)#
For paired observations \n((x_i, y_i)) and sample size
(n), compute the sample correlation
(r). Under the classical assumptions (especially approximate bivariate normality) and
(H_0:
ho = 0), the test statistic
[ t = r \sqrt{\frac{n-2}{1-r^2}} ]\n
has a Student t distribution with
(n-2) degrees of freedom.
two-sided:
(p = 2,P(T_{n-2} \ge |t|))one-sided: use the appropriate tail depending on
(H_1)
We include this mainly for interpretation and for cross-checking against libraries; the NumPy-only permutation test above does not require this reference distribution.
alpha = 0.05
r_obs, p_perm, r_perm = pearson_p_value_permutation(x, y, n_perm=20_000, alternative="two-sided", rng=rng)
print(f"Observed r = {r_obs:.4f}")
print(f"Permutation p-value (two-sided) = {p_perm:.4g}")
if stats is not None:
p_param = pearson_p_value_parametric(r_obs, n=x.size, alternative="two-sided")
t_stat = pearson_t_stat(r_obs, n=x.size)
print(f"t statistic = {t_stat:.4f} (df={x.size-2})")
print(f"Parametric p-value (two-sided) = {p_param:.4g}")
else:
print("SciPy not available: skipping parametric p-value")
print(f"Decision at alpha={alpha}: {'reject H0' if p_perm < alpha else 'fail to reject H0'}")
Observed r = 0.6010
Permutation p-value (two-sided) = 5e-05
t statistic = 6.6416 (df=78)
Parametric p-value (two-sided) = 3.756e-09
Decision at alpha=0.05: reject H0
Visual: permutation null distribution#
Under the null (no association), shuffled pairings should produce correlations near 0. The p-value is the fraction of shuffled correlations that are at least as extreme as the observed one.
thr = abs(r_obs)
mask_extreme = np.abs(r_perm) >= thr
fig = go.Figure()
fig.add_trace(
go.Histogram(
x=r_perm,
nbinsx=60,
name="permuted r (null)",
marker=dict(color="rgba(120,120,120,0.6)"),
histnorm="probability density",
)
)
fig.add_trace(
go.Histogram(
x=r_perm[mask_extreme],
nbinsx=60,
name=f"|r| >= {thr:.3f} (counts toward p)",
marker=dict(color="rgba(220,0,0,0.65)"),
histnorm="probability density",
)
)
fig.add_vline(x=r_obs, line_width=3, line_color="black")
fig.add_vline(x=-r_obs, line_width=3, line_color="black", line_dash="dash")
fig.update_layout(
barmode="overlay",
title=f"Permutation null distribution (p ≈ {p_perm:.4g})",
xaxis_title="Pearson r under H0 (shuffled)",
yaxis_title="density",
)
fig.show()
What the p-value means (and doesn’t mean)#
A (two-sided) p-value answers:
If (\rho = 0) (no linear association), how often would we see a sample correlation at least as extreme as the one we observed?
It is not:
the probability that (H_0) is true
the probability that the relationship is “real”
a measure of effect size (that is
ritself)
A good report includes both:
effect size:
runcertainty: CI and/or p-value
6) Confidence intervals (uncertainty on the effect size)#
A p-value tells you about compatibility with (\rho = 0). A confidence interval tells you a range of plausible (\rho) values.
We’ll compute a bootstrap percentile CI for r by resampling the paired observations.
lo, hi, r_boot = bootstrap_ci_r(x, y, n_boot=8_000, ci=0.95, rng=rng)
print(f"Bootstrap 95% CI for r: [{lo:.3f}, {hi:.3f}]")
fig = go.Figure()
fig.add_trace(go.Histogram(x=r_boot, nbinsx=60, histnorm="probability density", name="bootstrap r"))
fig.add_vline(x=r_obs, line_width=3, line_color="black")
fig.add_vline(x=lo, line_width=3, line_color="green")
fig.add_vline(x=hi, line_width=3, line_color="green")
fig.update_layout(
title="Bootstrap distribution of r with 95% CI",
xaxis_title="bootstrap Pearson r",
yaxis_title="density",
)
fig.show()
Bootstrap 95% CI for r: [0.440, 0.725]
7) Pitfall: outliers can dominate Pearson r#
Because Pearson r is based on means, variances, and covariance, a single extreme point can strongly change it.
We’ll compare the same dataset with and without one high-leverage outlier.
x0 = rng.normal(0, 1, size=40)
y0 = 0.5 * x0 + rng.normal(0, 1, size=40)
r0 = pearson_r(x0, y0)
# Add one outlier
x1 = np.append(x0, 8.0)
y1 = np.append(y0, -8.0)
r1 = pearson_r(x1, y1)
fig = make_subplots(rows=1, cols=2, subplot_titles=(f"No outlier (r={r0:.3f})", f"With outlier (r={r1:.3f})"))
fig.add_trace(go.Scatter(x=x0, y=y0, mode="markers", marker=dict(size=9, opacity=0.8), showlegend=False), row=1, col=1)
fig.add_trace(go.Scatter(x=x1, y=y1, mode="markers", marker=dict(size=9, opacity=0.8), showlegend=False), row=1, col=2)
fig.update_xaxes(title_text="x")
fig.update_yaxes(title_text="y")
fig.update_layout(title="A single point can change correlation a lot")
fig.show()
8) Pitfall: a strong relationship can have r ≈ 0 (nonlinear patterns)#
Pearson r measures linear association.
Example: if y = x^2 and x is symmetric around 0, the relationship is strong but not linear, and Pearson r can be close to 0.
x_nl = rng.uniform(-2.5, 2.5, size=200)
y_nl = x_nl**2 + rng.normal(0, 0.2, size=x_nl.size)
r_nl = pearson_r(x_nl, y_nl)
fig = go.Figure()
fig.add_trace(go.Scatter(x=x_nl, y=y_nl, mode="markers", marker=dict(size=7, opacity=0.7)))
fig.update_layout(
title=f"Nonlinear relationship (Pearson r = {r_nl:.3f} can be misleading)",
xaxis_title="x",
yaxis_title="y",
)
fig.show()
9) Sampling variability: r depends on n#
Even if the true correlation is (\rho = 0), sample correlations won’t be exactly 0.
As n increases, the sampling distribution of r tightens around the true value.
def simulate_r_under_rho(rho, n, n_sims=4000, rng=None):
if rng is None:
rng = np.random.default_rng()
z1 = rng.normal(size=(n_sims, n))
z2 = rng.normal(size=(n_sims, n))
x = z1
y = rho * z1 + np.sqrt(1 - rho**2) * z2
x_c = x - x.mean(axis=1, keepdims=True)
y_c = y - y.mean(axis=1, keepdims=True)
num = np.sum(x_c * y_c, axis=1)
den = np.sqrt(np.sum(x_c**2, axis=1) * np.sum(y_c**2, axis=1))
return num / den
rho = 0.0
r_n20 = simulate_r_under_rho(rho, n=20, n_sims=4000, rng=rng)
r_n80 = simulate_r_under_rho(rho, n=80, n_sims=4000, rng=rng)
r_n300 = simulate_r_under_rho(rho, n=300, n_sims=4000, rng=rng)
fig = go.Figure()
fig.add_trace(go.Violin(y=r_n20, name="n=20", box_visible=True, meanline_visible=True))
fig.add_trace(go.Violin(y=r_n80, name="n=80", box_visible=True, meanline_visible=True))
fig.add_trace(go.Violin(y=r_n300, name="n=300", box_visible=True, meanline_visible=True))
fig.update_layout(
title="Sampling distribution of r when true rho = 0",
yaxis_title="sample Pearson r",
)
fig.show()
10) How to report and interpret results#
A good report includes:
n(sample size)r(effect size)a confidence interval for ( ho) (recommended)
p-value and the alternative hypothesis
Example phrasing:
“We found a positive linear association between X and Y (Pearson r = 0.42, 95% bootstrap CI [0.20, 0.60], permutation p = 0.003, n = 80).”
Interpretation checklist#
Is the association linear (check a scatter plot)?
Are there outliers or clusters driving the correlation?
Is the result practically meaningful (not just statistically significant)?
Could a confounder explain both variables?
11) Practical usage (SciPy)#
If SciPy is available, scipy.stats.pearsonr computes:
Pearson r
the classical (parametric) p-value under the t-test reference distribution
We’ll compare it to our NumPy computations.
if stats is None:
print("SciPy not available")
else:
r_scipy, p_scipy = stats.pearsonr(x, y)
print(f"SciPy pearsonr: r={r_scipy:.4f}, p={p_scipy:.4g}")
print(f"NumPy pearson_r: r={pearson_r(x, y):.4f}")
SciPy pearsonr: r=0.6010, p=3.756e-09
NumPy pearson_r: r=0.6010
Exercises#
Create a dataset where Pearson r is near 0 but there is a strong nonlinear relationship. Plot it.
Simulate correlated data for ( ho\in{0, 0.2, 0.5}) and compare how often you reject (H_0) at
alpha=0.05asnincreases.Add a single outlier to a moderately correlated dataset and measure how much
rchanges.Implement Spearman correlation (rank-based) using NumPy and compare it on the nonlinear example.
References#
Student’s t relationship for Pearson correlation: standard results in most mathematical statistics texts.
Permutation tests / randomization tests: Fisher (1935), Good (2005), and many modern applied stats resources.
SciPy documentation:
scipy.stats.pearsonr.